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PREFACE 

This  report  covers  the  beginning  of  a separate  phase  of  human  factors  lest 
methodology  development  of  the  US  Army  Tropic  Test  Center.  While  past  efforts  have 
concentrated  on  human  performance  in  the  humid  tropics  (i.e.,  vision,  audition, 
portability/load  carrying,  land  navigation  ability,  rifle-fire  accuracy),  this  effort  turns  to 
the  subjective  domain  of  materiel  evaluation.  The  work  was  supported  by  the  US  Army 
In-House  Laboratory  Independent  Research  (ILIR)  Program. 


I.  IN  1 RODl'CI  ION 


In  tests  <>t  new  milit.ir>  hardware,  the  Army  has  traditionally  placed  hit'll  value  on  the 
acceptance  and  preference*  o!  u*er  personnel,  lienee,  the  concepts  ol  troop  tests,  service 
tests,  force  development  and  operational  tests  have  evolved,  as  distinguished  from 
engineering  and  developmental  tests,  (lie  latter  two  y ield  quantitative  indices  ol  hardware 
performance  functioning,  reliability  and  maintainability.  I he  formei  arc  often  assessed  by 
subjective  methods  and  n is  widely  recognized  that  the  subjective  evaluations  are  critical  to 
the  deploy  ment  of  new  hardware  items.  They  add  the  important  human  factor  which  is 
independent  of  engineering  data.  At  the  same  time,  subjective  measures  arouse-  suspicion  and 
uneasiness  among  many  system  evaluators.  Subjective  measures  are  prone  to  sources  of 
error,  to  include  biases  of  interviewer  and  interviewees,  resistance  to  change,  sheer 
disinterest  of  test  participants  and  the  classical  errors  ol  halo,  horn,  hello-goodbye,  central 
tendency,  acquiescent  response  sets,  and  many  more.  Guil/ord.  1954 

The  Army  has  adopted  two  general  approaches  to  resolve  the  problem.  The  first 
approach  is  to  improve  data  acquisition  techniques  in  obtaining  information  from  soldiers. 
This  involves  making  the  subjective  techniques  more  systematic.  Since  the  1930’s,  much 
effort  and  many  improvements  have  been  made  in  the  development  of  structured  interview 
techniques,  standardized  questionnaire  development,  rating  scales,  panel  evaluation,  and 
checklists.  However,  there  have  been  no  true  state-of-the-art  advances  since  the  1940’s  when 
the  “forced-choice"  evaluation  technique  was  developed  for  personnel  assessment.  The 
questionnaire  and  rating  scale  technology  being  used  in  the  1970’s  is  substantially  the  same 
as  that  in  use  during  World  World  II. 

The  second  approach,  which  has  great  popular  appeal,  is  to  make  the  subjective 
evaluations  more  objective.  That  is,  the  human  factor  is  approached  from  a quantitative 
viewpoint.  Instrumented  performance  courses  have  been  developed  to  measure  factors  such 
as  speed,  accuracy,  completeness,  and  relevance  for  a great  variety  of  military  tasks. 
Physiological  indicators  such  as  heart  rate,  body  temperature,  and  basal  metabolism  have 
also  been  widely  used  in  performance  assessment.  However,  hundreds  of  studies  have  shown 
that  objective  performance  measures  do  not  predict  the  subjective  expressions  of  test 
participants.  Rather,  performance  measurement  has  furnished  an  indispensable  but 
independent  measure  of  the  human  factor.  The  technology  of  subjective  assessment  has 
come  to  a virtual  standstill. 

Over  the  past  25  years,  a successful  small  scale  anti  low  visibility  program  has  begun  to 
show  promise  in  the  area  of  subjective  measurement.  The  work  has  been  carried  on  in 
various  university  laboratories  in  the  general  area  of  “psychophysical  scaling."  The  intent  of 
the  present  study  is  to  transfer  this  technology  to  Army  materiel  testing. 

In  the  process  of  describing  human  performance  demands  of  new  complex  systems,  it 
is  important  to  recognize  the  existence  of  performance  problems,  to  identify  their  source, 
and  to  measure  their  magnitudes.  By  eliminating  or  reducing  the  magnitude  of  the  problem, 
the  overall  efficiency  of  a system  may  be  increased.  But  the  first  step  in  eliminating  a 
problem  is  often  a subjective  report  of  its  existence.  An  operator  or  controller  of  a new 
system  may  express  difficulty  in  its  "handling,”  but  not  be  able  to  pinpoint  the  source  of 
the  problem  or  directly  measure  its  magnitude.  Typical  subjective  measurement  scales 
produce  ‘‘category -scaled"  data  with  units  that  are  ordinal  at  best  (allowing  rank-order 
comparisons,  but  not  statements  as  to  amount  of  difference  or  absolute  lcvcl).s,'’,’<'",’  1975 
Category-scaled  data  may  be  contrasted  to  data  derived  from  more  scientific  methods  of 
systematic  measurements  that  produce  “ratio-scaled"  data  (having  an  absolute  zero  and 
units  that  may  he  legitimately  manipulated  mathematically). 
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II.  PROBLEM  AM)OB|KCTlVK 


This  investigation  at  lacks  (lie  problem  of  obtainuig  a quantitative  subjective  measure  of 
effectiveness  (SMOK)  durum  developmental  testing  ol  Arm>  materiel.  Typical  ulegor)' 
scales  ol  current  subjective  questionnaires  provide  ordinal  data  that  ran  neither  Ire 
manipulated  mathematically  beyond  simple  summation  nor  can  be  analyzed  statistically 
beyond  nonparametric  tests  ol  partitioned  responses.  This  investigation  will  develop 
ratio-scaled  procedures  for  obtainin'*  subjective  questionnaire  responses.  The  scientific 
method  may  then  be  applied  to  subjective  data.  SMOK  may  be  delineated  more  precisely. 

III.  BACKOROl  XI) 

Current  guides  for  subjective  questioning,  questionnaire  design,  and  data  analysis 
include  a wide  variety  of  techniques.  Each  technique,  although  well  used  and  very  usclnl  up 
to  a point,  has  the  same  scaling  problem  in  varying  degrees  (ThCOM  Pam  602-1).*  Except 
lor  free-response  subjective  questioning,  the  problem  is  that  respondents  arc  forced  to 
conform  to  preset  scales,  therefore  losing  the  freedom  to  respond  more  sensitively  and 
precisely  according  to  their  feelings  and  opinions.  Free- answer  or  open-ended  questioning  is 
useful  in  exploratory  studies  where  restrictions  in  response  form  may  inhibit  expression  of 
potentially  important  personal  insight,  or  is  useful  as  a follow-up  technique  lor  amplifying 
or  explaining  scaled  responses.  In  either  case,  resulting  verbalizations  arc  of  more  use  in 
formulating  questions  than  in  documenting  response  levels;  response  scaling  methods  do  not 
apply  to  the  problem  of  this  investigation.  Questionnaire  designs  other  than  the  open-ended 
type  contain  specific  questions,  ca<  h requiring  a respondent  to  conform  to  a preset  response 
mode.  The  most  basic  of  these  is  the  dichotomous  mode  where  the  response  is  the 
equivalent  of  yes  or  no  (sometimes  including  a third  don't  know  option).  'lire  constraint  of 
the  dichotomous  response  provides  no  sensitivity  to  the  degree  of  “yesness”  or  “noncss” 
that  the  respondent  may  be  able  to  express.  Although  gestures  or  verbal  comments  may 
qualify  these  responses  at  the  time  they  are  recorded,  analyses  of  the  data  are  denied  such 
advantages  and  arc  limited  to  the  oversimplified  response  split. 

rhe  next  level  of  response  sophistication  contains  a host  of  categorically  scaled 
mechanisms  including  multiple  choice  or  checklist  responses  (where  one  or  more  of  a 
number  ol  alternative  nominal  categories  are  to  be  checked  in  preference  to  others),  and 
rating  scales  (where  the  respondent  is  to  select  one  category  from  an  ordered  series  such  as 
no  problem,  very  little  problem , somewhat  difficult,  or  very  difficult).  Rating  categories  arc 
verbal,  numerical,  or  both.  Some  are  composed  of  a numerical  scale  (rarely  over  10  points 
on  the  continuum)  combined  with  a verbal  anchor  at  each  of  tire  extremes,  but  not  at  the 
middle  points,  such  as: 

123456789  10 

Agree  Disagree 

Strongly  Strongly 

Although  rating  scales  are  used  to  gain  a degree  of  sensitivity  to  variation  in  possible 
responses,  and  are  superior  to  dichotomous  scales  in  that  respect,  rating  scales  arc 
nonetheless  ordinal,  with  some  (based  on  standardized  phrases)  achieving  a quasi-interval 
nature.  The  points  along  rating  scales  are  designed  or  assumed  to  be  evenly  spaced  for 
purposes  ol  analyzing  response  levels.  However,  unless  the  response  categories  of  a rating 
scale  have  been  shown  to  be  equally  spaced  by  way  of  standardized  weighting  procedures, 
then  interval  scaling  cannot  be  assumed  and  the  more  powerful  parametric  statistics  arc- 
inappropriate. 

•TECOM  Pamphlet  602-1 , Vol.  I , Quest ionnaire  anil  Interview  Design  (Subjective  Testing  Techniques),  25  July  1975. 
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A good  example  ol  the  problem  til  this  investigation  may  lie  seen  l>y  examining  the 
nature  of  a type  <>!  subjective  questioning  that  was  lelt  out  ol  the  foregoing  paragraph  for 
this  purpose  the  ranking  question.  A typical  ranking  question  presents  a numlier  ol  items 
that  the  respondent  is  asked  to  pul  in  rank  order  according  to  bis  personal  preference.  An 
example  is: 

Rank  the  following  types  of  helmets  in  the  order  of  your  preference. 

(1  = most  preferred,  2 = next  preferred,  etc.) 

(a)  New,  Type!  

(b)  New,  Type  II  

(c)  New,  Type  III  

(d)  Standard  

A particular  soldier  who  happened  to  like  the  new,  Type  I,  helmet  best  would  put  the 
number  I on  the  first  line,  followed  perhaps  by  ranks  of  4,  3,  and  2 lor  the  remaining  items, 
respectively.  Analyses  of  the  data  are  limited  to  those  appropriate  for  ordinal  measurement 
because  it  cannot  be  inferred  from  the  ranks,  for  instance,  how  much  mon  that  one  helmet 
is  preferred  to  the  others.  But  suppose  we  lifted  the  restriction  in  range  of  assignable 
“ranks”  from  1 through  4 to  0 through  any  number  the  soldier  wanted  to  use.  Furthermore, 
because  our  society  generally  associates  “largeness"  with  “goodness,”  we  could  turn  the 
scale  around  and  ask  the  soldier  to  think  of  zero  as  representing  the  least  preferred  helmet 
imaginable  to  him,  and  place  no  restriction  on  his  assignment  of  a number  to  the  helmet 
that  is  most  preferred  on  his  own  scheme  of  preference.*  Then  the  soldier  whose  preference 
for  the  new,  Type  II,  helmet  was  extremely  low,  but  not  as  low  as  something  else  (hat  was 
not  on  the  list,  could  rate  new.  Type  II,  as  9,  new.  Type  III,  as  100,  the  Standard  .is  210 ; 
and,  if  new.  Type  I,  were  far  out  on  his  own  preference  scale,  he  could  rate  it  as  9000  if  that 
number  represented  the  way  his  preference  ran.t  By  transforming  the  rating  scale  into  an 
unconstrained  numerical  field  and  allowing  the  subject  to  match  numbers  to  his  feelings,  a 
higher  order  of  scaling  occurs.  The  scale  not  only  goes  from  ordinal  to  interval  (where  we 
can  say  that  the  difference  between  new.  Type  II,  and  new.  Type  III,  preference  was 
91  - 100  - 9),  but  also  goes  to  ratio  (where  a real  zero  preference  level  allows  us  to  say  that 
his  preference  for  the  new,  Type  I,  helmet  was  1000  times  greater  than  for  new,  Type  II, 
and  43  times  greater  than  the  Standard  helmet). t 

The  foregoing  illustrated  the  problems  associated  with  category  scaling  of  subjective 
questionnaire  responses.  The  last  example  introduced  the  psychophysical  measurement 
approach— cross-modality  matching— that  this  investigation  will  lake  toward  their  solution. 

IV.  APPROACH 

The  general  approach  to  the  problem  is  to  apply  the  ratio  scaling  techniques  of 
psychophysical  cross-modality  matching  to  subjective  questioning.  Ratio  scaling  is  a fairly 
recent  state-of-the-art  advancement  in  psychological  measurement.  For  over  200  years, 
psychologists  and  physicists  have  been  building  a case  that  the  intensity  of  a stimulus  and 

♦The  exact  wording  of  the  question  and  the  basis  ol  his  preference  (weight,  shape,  balance)  would  be  important  issues  to 
resolve  in  an  actual  test,  but  need  not  be  addressed  here  in  a discussion  of  response  scaling. 

t Again,  problems  surrounding  question  wording,  practice,  varying  ranges  among  individuals,  and  data  reduction  techniques 
are  areas  of  needed  research,  but  the  scaling  concept  may  be  discussed  separately  from  their  solution  at  this  time. 
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the  magnitude  of  the  sensation  were  related  according  to  a logariihmit  I unci  ion.  However, 
in  the  last  25  years  S.  S.  Stevens,  among  others,  showed  that  the  logaiilhmu  I unci  ion  is 
biased  for  most  continua  heeause  data  were  collected  through  partitioning 
procedures  judging  subdivisions  or  apparent  difference*  through  such  methods  as  just 
noticeable  differences  (JNDs),  rather  than  judging  ratios. 


"On  prothetic  continua-those  continua  concerned  with  intensity  or 
amount,  the  vanability-and  hence  the  JND,  tends  to  increase  in 
proportion  to  the  magnitude  Consequently,  the  counting  off  of 
prothetic  JND  leads  to  a logarithmic  (unction  When  partitioning 
procedures  (including  bisection  and  category  scaling)  are  applied  to 
prothetic  continua,  there  results  a biased  function,  a function  that  is 
curved  relative  to  the  scale  of  magnitude  determined  by  ratio  scaling 
procedures. 

"Scales  of  perceptual  magnitude  may  be  created  by  asking  observers  to 
match  numbers  to  stimuli.  Beginning  in  1953,  it  was  shown  that  on 
prothetic  continua  the  perceived  magnitude  increases  as  a power 
function  of  the  stimulus  magnitude  Each  modality  has  its  own 
exponent,  although  the  value  of  the  exponent  may  change  with 
adaptation,  contrast  and  other  parameters  of  our  experiment  The 
exponent  of  the  power  function  determines  the  curvature  of  the 
function.  The  basic  principle  that  underlies  the  power  law  is  that  equal 
stimulus  ratios  produce  equal  sensation  ratios. "Elevens,  197. 5.  pp  35  36 

So,  the  useful  basic  concept  behind  psychophysical  scaling  is  that  "measurement"  is  a 
process  or  procedure  that  can  be  applied  to  sensations,  perceptions,  or  subjective 
questionnaire  responses.  Measurement  is  much  broader  than  counting  or  enumerating  things 
in  terms  of  a physically  countable  unit.  Stevens,  1975,  proposed  "matching”  to  be  the  basis 
of  all  measurement;  counting  was  explained  as  a special  case  of  matching,  where  words  and 
numerals  have  come  to  be  substituted  for  the  original  procedure  of  matching  pebbles, 
notches  on  a stick,  or  tallies  to  the  items  of  interest  (measuring  numerosity).  Stevens 
regarded  measurement  as  a "two-part  endeavor,  consisting  on  the  one  hand  of  manipulations 
and  on  the  other  of  models."  He  explained  the  measurement  procedure  as  a "schcmapiric 
enterprise  . . . the  schematics  of  mathematics  and  the  empirics  of  laboratory  operations. 
Mathematics  can  mirror  manipulations,  but  it  no  longer  legislates  their  freedom.  W'c  now 
recognize  that  measurement  extends  to  wherever  . . . we  can  invent  systematic  rules  for 
pinning  numbers  on  things.”  When  the  rules  involve  a procedure  for  directly  matching  a 
perceived  magnitude  along  one  continuum  to  a perceived  magnitude  on  another  continuum, 
the  magnitude  of  the  sensation  has  been  shown  to  be  a power  function  of  the  stimulus,  and 
a ratio-scaled  measurement  results.'’ 1^75 

In  order  to  understand  the  nature  of  the  problem  addressed  by  this  investigation,  and 
the  scope  of  the  methodology  stated  in  the  next  section,  the  following  measurement  and 
psychophysical  terms  and  relationships  are  offered.  They  have  been  compiled  from  a review 
of  approximately  200  articles  and  books  published  in  the  area  of  psychophysics  in  the  last 
five  years.  Terms  anti  examples  not  referenced  upon  their  initial  use  are  the  authors*.  In  an 
attempt  to  bring  the  cascade  of  terms  into  some  perspective,  they  have  been  placed  (forced 
in  some  cases,  perhaps  beyond  the  limits  of  their  original  intent)  into  a tentative  taxonomy 
that  will  undoubtedly  change  as  investigations  progress.  The  terms  are  presented  first,  the 
taxonomy  follows. 

<> 


Prothc(ics,'',','’,J'  ,97i:  Refers  to  quantitative  continua  on  which  the  degree  of  a stimulus 
or  response  may  be  sealed.  I’he  stimulus-response  (SR)  is  additive,  allots  ini;  a measurement 
of  “how  much”  or  “how  much  more”  a stimulus  is  presented  or  a response  is  made. 
Contrasted  to  metathetic. 

Metatheticj>t*'t,*'">-  1975 : Refers  to  qualitative,  positional  continua  <>n  which  different  kinds 
of  sensations  may  be  categorized.  Positions  on  the  continuum  arc  independent,  allowing 
substitutive  measurement  of  “different  from  in  kind  " Stevens  gives  an  example  that 
“ . . . sweet  is  (metathetically)  different  from  sour,  although  both  may  vary  (prothctically) 
from  strong  to  weak.” 

Hctcrothctic:  Refers  to  an  SR  relationship  wherein  both  prolbetic  and  metathetic  continua 
must  be  measured  for  its  description. 

lntcroceptivc^u^u'flfl-  l97i:  Refers  to  subjective,  judgmental  aspects  of  a sensation  for 
which  no  direct  physical  measurement  is  appropriate  for  all  individuals.  May  hr 
metathetically  scaled  to  distinguish  basic  properties.  May  be  prothctically  staled  to 
distinguish  among  levels  of  intensity.  Examples  arc  anxiety,  hunger,  anger,  thirst,  and 
fatigue. 

Exteroceptive^"^”'0'1-  1973.  Refers  to  objective,  physical  aspects  of  a sensation  for  which 
reliable  data  may  be  fixed  to  the  stimulus.  Is  prothctically  scaled.  Examples  are  brightness, 
loudness,  heat,  and  weight. 

Heteroceptivc:  Refers  to  an  SR  relationship  wherein  both  interoceptive  and  exteroceptive 
continua  are  necessary  for  its  description. 

Intensive:  Designates  investigation  of  a single  (one)  stimulus  or  response  that  r.iav  be  cither 
simple  (consisting  of  one  prothctically  measured  part)  or  complex  (consisting  of  a set  of 
more  than  one  interrelated,  prothctically  measured  component  parts);  contrasted  to 
“extensive.”  A suggested  example  of  a simple  intensive  stimulus  is  a straight  line  measured 
by  its  length.  An  example  of  a complex  intensive  stimulus  may  be  brightness,  as  measured 
by  flash  duration  and  luminancc.^ar*5,  1 97*  Prothetic  measures  must  be  stated  in  order  to 
ensure  that  component  parts  arc  totally  accounted  for  and  can  be  interrelated.  Otherwise, 
the  existence  of  an  undiscovered  unrelated  part  may  require  a redesignation  to  “extensive." 
Designations  of  intensive  are  based  on  “current  knowledge”  and  arc  therefore  tentative  at 
best. 

Extensive:  Designates  investigation  of  compound  (two  or  more  unrelated)  stimuli  or 
responses,  each  of  which  may  be  cither  simple  or  complex.  An  example  is  the  size, 
operability,  portability,  maintainability,  and  safety  of  a weapon. 
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Tentative  Taxonomy  Based  on  Selected  Psychophysical  Terminology 


Type  of  Scale  Nature  of 

Measurement Continue) Stimulus  (S) 

Psychophysical  Intensive  Single 

(Stevens,  1975)  heteroceptive  simple 

prothetic  physical 

(measured) 


Nature  of 
Response  (R) 

Single 
simple 
sensations  or 
perceptions 
(measured) 


Research 

Product 

Relationship 
between  S & R 
(power  function 
exponent) 


Example  Stevens7973  provides  many  examples  of  power  function 
exponents  (0).  where  the  R magnitude  (0)  grows  as  a power  function  of 
the  S magnitude  (0)  in  the  form  0 = K0P.  where  K is  a constant 
dependent  on  the  units  of  measurement  For  perceived  loudness  of  a 
3000H?  tone.  0 = 0.67.  for  discomfort  from  whole  body  irradiation, 
0 = 0 7,  for  perceived  heaviness  of  lifted  weight,  0 = 1 45 


Psychosensory  Intensive 
(Marks,  1974)  interoceptive 
prothetic 


it  it  * 

Physical  Single 

(measurement  complex 

irrelevant)  sensations 

(measured) 


Relationship 
among  R com- 
ponents (equa 
tion  valid  for 
all  levels  of  S) 


Example  Marks  1974  used  the  example  that  the  loudness  of  a sound 
heard  by  two  ears  (Lb)  equals  the  sum  of  the  loudness  heard  by  the  left 
(Li)  and  right  (Lr)  ears  (Lb  = L|  + Lr).  The  magnitude  of  the  stimulus  is 
irrelevant  and  need  not  be  measured  to  determine  the  psychosensory 
function. 


Sensory- 
physical 
(Marks,  1974) 


Intensive 
ex  teroceptive 
prothetic 


it  6 it 

Single 

complex 

physical 

(measured) 


Sensations  or 
perceptions 
(measurement 
irrelevant) 


Relationship 
among  S com- 
ponents (equa- 
tion valid  for 
all  levels  of  R) 


Example:  Marks  1974  also  provided  the  example  of  Bloch's  law  of 
temporal  summation  wherefrom  constant  brightness  (Kb)  is  the 
product  of  flash  duration  (t)  and  luminance  (L);  i.e.,  (L  x t = Kb).  The 
magnitude  of  the  response  level  on  a scale  of  brightness  is  irrelevant  and 
need  not  be  measured  by  psychophysical  methods  to  determine  the 
sensory -physical  relationship. 


it  it  it 

Psycho- 

attitudinal 

Extensive 

heteroceptive 

heterothetic 

Compound 

complex 

physical  and 

situational 

(measurement 

irrelevant) 

Compound 

simple 

attitudes 

(measured) 

Collective 
evaluation  from 
diverse  separate 

R elements  (sum- 
mary & analysis 
of  R pattern) 

(corn) 
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Example:  Evaluating  various  aspects  of  attitude  of  a population  toward 
an  object  or  situation  such  as  assessing  troop  acceptability  of  a new 
military  item  used  in  a harsh  environment  Ps> chophysical 
measurement  methods  used  to  obtain  multiple  ratio-scaled 
questionnaire  Rs.  Ratio-scaling  provides  level  and  pattern  of  attitudes 
toward  object  or  situation;  provides  base  for  comparisons  across  objects 
or  situations. 

a a a 

The  preceding  taxonomy  separates  types  of  measurement  on  the  bases  of  the  degree  of 
complexity  and  duplicity  of  the  stimulus  and  response,  and  whether  or  not  the  stimulus  and 
response  must  be  measured.  For  psychophysical  measurement,  both  the  stimulus  and 
response  must  he  measured;  each  is  single  and  simple;  measurement  relates  the  magnitude  of 
one  stimulus  to  the  magnitude  of  one  response.  For  psychoscnsory  measurement,  only  the 
response  is  measured;  the  response  is  single  but  complex;  measurement  relates  the 
magnitudes  of  the  component  parts  of  the  response.  For  sensory -physical  measurement, 
only  the  stimulus  is  measured;  the  stimulus  is  single  but  complex;  measurement  relates  the 
magnitudes  of  the  component  parts  of  the  stimulus.  For  psycho-attitudinal  measurement, 
only  the  responses  arc  measured;  the  responses  arc  compound  and  simple;  measurement 
describes  the  magnitudes  of  diverse  responses  that  may  or  may  not  be  related.  It  is  within 
the  final  or  psycho-attitudinal  type  of  measurement  of  the  preceding  taxonomy  that  the 
current  investigation  lies.  The  reason  is  that  the  goal  of  this  investigation  is  to  develop  a 
subjective  measure  of  effectiveness  (and  associated  instrumentation)  that  would  require  a 
single  procedure  for  measuring  a series  of  attitudes,  the  natures  of  which  may  be  quite 
different  (compound  attitudes).  The  other  three  types  of  measurement  arc  aimed  at 
intensive  investigations  of  single  sensations  or  perceptions,  measured  by  highly  specialized 
procedures  and  instrumentation  that  may  be  of  little  value  across  the  many  responses  to  a 
questionnaire. 

V.  METHOD 

SUBJECTS 

During  initial  SMOE  developmental  stages  of  each  of  the  response  modes,  inhousc 
personnel  will  be  used  on  the  basis  of  their  availability.  During  validatior  stages,  random 
samples  of  in-house  personnel  will  be  used  in  addition  to  representative  troops  from  the 
193d  Infantry  Brigade  (Canal  Zone).  It  has  been  found  that  a group  of  10  individuals 
provides  data  stable  enough  for  validating  psychophysical  power  functions?f<’t'<’nl-  30 

During  field  trials,  test  subjects  will  be  the  personnel  who  operate,  maintain,  or  arc 
otherwise  involved  in  active  tropic  testing  of  materiel  from  whom  subjective  questionnaire 
responses  would  normally  be  obtained. 

PROCEDURE 

General  SMOE  Development  Program.  To  show  how  the  specific  procedures  of  this  report 
fit  into  the  longer  term  objectives  of  the  SMOE  development  program,  it  will  be  helpful  to 
outline  the  general  program  procedure  first. 
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a.  Modal  development  is  the  first  slant-.  Each  response  mode  investigated  must 
undergo  a modal  development  stage  where  materials  and  methods  will  he  established,  trial 
tested,  and  honed  to  a point  where  validation  of  modal  response  patterns  may  be  attempted. 

b.  Psychophysical  validation  will  be  the  second  stage  ol  the  program.  Many  response 
modes  have  been  used  by  psychophysical  experimenters  over  the  last  25  years.  Kach  has  an 
associated  power  function  exponent  that  has  been  replicated  many  times.  The  established 
exponents,  then,  may  serve  as  criteria  against  which  the  materials  and  procedures  developed 
in  the  previous  stage  may  be  validated.  For  instance,  it  has  been  determined  that  when  a 
person  draws  lines  on  a paper  to  represent  the  magnitudes  of  numbers  spoken  to  him,  the 
lengths  of  the  lines  are  in  a 1 : 1 ratio  to  the  magnitude  of  the  numbers  he  hears.  Therefore,  if 
producing  a line  were  considered  as  a useful  way  of  gauging  the  intensity  of  an  attitude  (or 
each  of  a series  of  attitudes  as  on  a questionnaire),  then  the  procedure  for  producing  the  line 
(exact  instructions,  size  of  paper)  should  be  shown  to  yield  a 1:1  relationship  to  the 
magnitude  of  spoken  numbers  as  an  initial  calibration  step.  Similarly,  if  producing  a 
matching  tone  were  to  be  a basis  for  measurement,  the  calibration  step  would  be  to  replicate 
the  .67  power  exponent  found  to  exist  in  magnitude  estimations  of  tones.  Response  modes 
that  do  not  compare  favorably  to  appropriate  criterion  values  will  then  be  recycled  through 
the  development  stage  as  many  times  as  necessary  to  insure  that  the  procedures  for 
obtaining  subjective  responses  via  the  mode  in  use  do,  indeed,  produce  ratio-scaled  responses 
with  acceptable  power  function  exponents.  Figures  1 through  4 show  various  methods  of 
ratio  scaling  techniques. 


■wvwvwv 

R heostat 


I Hr 


Battery 


Figure  1.  Ratio  Scaling  by  Varying  Voltage-A  subject  could  control  a voltage  from  some 
minimum  to  some  maximum  by  changing  the  position  of  a rheostat.  His  response 
would  be  read  as  a number  on  a digital  voltmeter. 
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♦ Vcc 


Figure  2.  Ratio  Scaling  by  Use  of  Frequency  Control-A  subject  would  control  a variable 
frequency  source.  By  listening  with  headphones  he  could  set  his  response 
according  to  the  highness  or  lowness  of  the  frequency.  His  response  would  read  as 
a number  on  a frequency  counter.  The  number  could  vary  from  zero  to  the  limit 
of  the  frequency  source. 


+ 15 


-1  s 

Figure  3.  Ratio  Scaling  by  Use  of  Loudness  Control— A subject  would  move  a volume  control 
and  set  the  loudness  of  a sound  in  accordance  with  his  likes  or  dislikes.  For 
example,  the  louder  the  sound  the  more  he  likes  or  dislikes  an  item.  His  response 
could  be  read  as  a number  either  on  a voltmeter  or  a sound-level  meter.  '"*•  lyn 

1 1 


Figure  4.  Ratio  Scaling 
by  Number 
Selection-A 
subject  could 
indicate  preference 
by  selecting  any 
number  between 
1 and  10  and 
observe  his 
selection  on  a 
digital  LED 
readout. 


r 


c.  Psycho- altitudinal  validation  is  scheduled  to  follow  the  psychophysical 
validation  discussed  above.  Each  response  measurement  mode  will  be  tested  for  its  validity 
to  measure  known  amounts  of  difficulty  to  perform  soldier-item  tasks  that  arc 
representative  of  subjectively  measured  human  factors  aspects  of  tropic  testing.  In  order  to 
ensure  that  all  important  human  factors  aspects  arc  covered  in  this  stage  of  the  investigation, 
two  mock  test  items  will  be  built.  The  purpose  will  be  to  ensure  that  the  amount  of 
difficulty  to  perform  tasks,  the  stimuli,  can  be  controlled  and  measured  to  provide  known 
criteria.  Each  mock  item  will  be  identical  to  the  other,  except  for  superficial  aspects  that 
make  one  appear  to  be  a test  item  and  the  other  to  be  a control  item.  The  nature  of  the 
items  may  not  be  uniike  a chemical-biological  shelter  system  used  in  the  tropics.  Various 
soldier-item  interfaces  (such  as  force  to  open  a door,  weight  of  movable  components,  light 
levels,  noise  levels,  control  manipulation  force,  temperature,  clarity  of  operating  manuals) 
will  be  set  at  different  known  levels  in  each  of  the  items  (with  the  difficult  interfaces  not 
necessarily  being  all  in  the  same  item).  At  this  stage  of  SMOE  development,  combat  troops 
who  would  normally  use  such  an  item  will  be  tested  for  their  subjective  responses  to 
difficulty  in  performing  the  tasks.  Ratio-scaled  subjective  responses  obtained  through  the 
mode(s)  being  developed  may  be  compared  to  known,  preset,  levels  of  difficulty  in  carefully 
designed  and  controlled  experiments.  Examples  of  types  of  validity  and  reliability  studies 
that  may  be  conducted  art:  ability  of  a psychophysical  response  mode  to  reflect  various 
known  levels  of  difficulty;  sensitivity  of  a response  mode  to  small  differences  in  preset  levels 
of  difficulty  (at  low  levels,  intermediate  levels,  and  high  levels  of  difficulty);  stability  of 
response  level  when  preset  levels  are  identical  in  test  and  control  items— tested  same  point  in 
time;  stability  of  response  levels  over  varying  lengths  of  time  between  trials  on  the  same 
item,  set  to  the  same  level  each  time;  comparisons  among  various  response  modes,  including 
typical  category  scales,  on  all  of  the  above;  and  reaction  of  troops  to  measurement  methods. 

d.  Field  validation  will  be  conducted  after  the  various  response  modes  have  been 
validated  and  compared  as  outlined  in  the  preceding  paragraphs.  The  most  suitable  modes 
will  be  tested  in  the  field  during  regularly  scheduled  tropic  tests  of  materiel  items. 
Comparison  of  combat  troop  response  to  test  items  will  be  made  using  data  from 
ratio-scaled  SMOE  and  data  obtained  from  typical  category-scaled  techniques.  An  example 
of  field  validation  would  be  a series  of  simple  experiments  using  two  items,  standard  and 
new;  say  entrenching  tools— standard  “old”  and  N’ARADCOM’s  “new.”  Have  20  soldiers  dig 
two  holes  each;  then,  use  a potentiometer  to  compare  preferences.  Also  use  one  or  two 
paper  and  pencil  scales;  analyze  for:  (1)  reliability  of  ratio  scaling  from  soldier  to  soldier;  (2) 
correlation  between  ratio  scaling  and  papcr/pencil  scales. 

e.  SMOE  modeling  will  be  performed  with  techniques  that  prove  to  be  effective  for 
obtaining  ratio-scaled  subjective  responses  for  a variety  of  typical  materiel  items  scheduled 
for  tropic  testing.  Techniques  will  be  formalized  into  standard  test  operation  procedures  and 
associated  instrumentation  suitable  for  use  throughout  the  Army. 

Program  Application. 

a.  As  an  example  of  how  the  SMOE  program  would  work,  let  us  consider  a typical 
situation  in  which  a test  item,  say  a new  protective  fragmentation  vest  or  helmet  where  item 
acceptance  relics  heavily  on  subjective  data  from  troops,  is  compared  with  a standard  item. 
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The  comparison  is  generally  required  in  several  environments  (temperate,  humid  tropic, 
arctic,  and  desert),  in  numerous  tactical  situations  (attack,  defense,  parachuting),  and  a 
myriad  of  functional  capacities  (body  movement,  stability,  comfort,  compatibility, 
vulnerability,  maintainability  safety,  confidence).  Each  of  the  functional  capacities  may  be 
covered  by  several  specific  questions  on  the  degree  of  difficulty  in  performing  specific  tasks 
(moving  the  head,  keeping  balance,  staying  cool/warm,  interfering  with  rifle  firing,  seeing, 
providing  camouflage). 

b.  The  test  situation  calls  for  a multivariate  analysis  that  would  not  only  uncover 
major  problems  with  the  test/control  item,  but  also  identify  possible  interaction  effects;  the 
test  system  may  be  of  greater  utility  in  one  environment  and  of  lesser  utility  in  another 
environment,  with  the  opposite  being  true  for  the  control  system.  Given  a coordinated  test 
program  where  methods  and  instrumentation  are  standardized  (e.g.,  potentiometer  slide  and 
taped  instructions  and  questions),  ratio-scaled  subjective  data  could  be  analyzed,  for 
instance,  in  a 4 (environments)  x 3 (tactics)  x 8 (functions)  x 5 (tasks  nested  within  each 
function)— a powerful  analytical  tool  not  legitimate  for  typical  subjective  data. 
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